test(envelope-contract): pin producer→consumer envelope contracts (closes #114)#178
Merged
Conversation
…oses #114) Adds 25 contract tests that pin the envelope shapes each dispatch tier emits against the handler-side parsers that consume them, so a future divergence on either side fails fast in CI rather than slipping through to production. Background: handler integration tests stub `runAgentDevice` via `_setRunAgentDeviceForTest` with synthetic envelopes. Codex flagged on PR #109 (conf 80) that this short-circuits the real wrapper's three dispatch tiers, each of which produces subtly different shapes. The original issue listed agent-device's internal tiers (fast-runner HTTP / daemon socket / CLI), but post-PR #164 (iOS-MVP) and PR #165 (Android- MVP) the surface widened to include the in-tree iOS and Android runner clients too. This PR covers the current surface. Producers pinned: 1. In-tree iOS runner (rn-fast-runner-client.runIOS) — flat nodes with `{ref: '@e<n>', type, rect, label?, identifier?, enabled?, hittable?}` shape after mapRunnerNodesToFlat normalization 2. In-tree Android runner (rn-android-runner-client.runAndroid) — identical flat-node shape (the parity test pins this — a divergence here would silently break platform-agnostic handlers) 3. Legacy upstream agent-device daemon socket — flat-nodes with less metadata 4. Legacy upstream agent-device CLI subprocess — separate fixture even though current shape equals daemon, so a future divergence would surface here 5. Legacy upstream agent-device internal fast-runner sub-tier — nested-tree shape, NOT flat. findRefByTestID's `env.data.tree` branch handles this; removing the branch without warning would fail this test 6. iOS XCUIElement.typeText runner-timeout shim — `{ok:true, data: {typed, text}, meta: {sideEffectSucceeded, runnerTimeoutShim}}`. snapshotEnvelopeFailed must NOT report this as a failure (it would route every successful iOS fill to SNAPSHOT_FAILED otherwise) Consumers exercised: - findRefByTestID (device-batch.ts) — both flat-nodes and nested-tree branches - snapshotEnvelopeFailed (device-batch.ts) — including the critical distinction between empty-nodes success (TESTID_NOT_FOUND) and snapshot-infrastructure failure (SNAPSHOT_FAILED), per Phase 128 #5/#6 - Edge cases: null/undefined/empty/malformed JSON all classified as failed codex-pair caught three fidelity issues during review (MED): my initial fixtures used `ref: 'app-0'` style refs with `parentIndex`/`depth` fields, but the actual mapRunnerNodesToFlat output emits `@e<n>` refs with `type`/`rect`/`enabled`/`hittable`. The failure fixture used the raw HTTP error shape `{error: {message, code}}` instead of the post- failResult `{ok:false, error: string, code: string}` shape MCP consumers actually see. And the comment claimed daemon + CLI were pinned separately but only daemon was. All three fixed before commit — a contract test with the wrong fixtures is worse than no contract test because it gives false confidence. Verified: 1506/1506 cdp-bridge unit tests passing (+25 net new). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #114 — adds 25 contract tests that pin the envelope shapes each dispatch tier emits against the handler-side parsers that consume them. A future divergence on either side fails fast in CI rather than slipping through to prod.
Background
Handler integration tests stub
runAgentDevicevia_setRunAgentDeviceForTestwith synthetic envelopes. Codex flagged on PR #109 (conf 80) that this short-circuits the real wrapper's dispatch tiers, each of which produces subtly different shapes. The original issue listed agent-device's internal tiers (fast-runner HTTP / daemon socket / CLI subprocess); post-PR #164 (iOS-MVP) and PR #165 (Android-MVP) the surface widened to include the in-tree iOS and Android runner clients too. This PR pins the current surface.Producer fixtures pinned
{ref: '@e<n>', type, rect, label?, identifier?, enabled?, hittable?}mapRunnerNodesToFlatnormalizationfindRefByTestID's second branch — removing it would fail this test{ok:true, data: {typed, text}, meta: {sideEffectSucceeded, runnerTimeoutShim}}Consumers exercised
findRefByTestID— both flat-nodes and nested-tree branches, plus empty-nodes/no-match returnssnapshotEnvelopeFailed— the critical Phase 128 rn-tester agent: excessive retries, Python workarounds, and inability to test permission flows #5/ENAMETOOLONG error when installing plugin via marketplace #6 distinction between empty-nodes success (TESTID_NOT_FOUNDdownstream) vs snapshot-infrastructure failure (SNAPSHOT_FAILED)null/undefined/empty string/malformed JSON all classified as failedWhat codex-pair caught during review
Three MED fidelity issues — a contract test with wrong fixtures is worse than no contract test because it gives false confidence. All fixed pre-commit:
ref: 'app-0'style withparentIndex/depth, butmapRunnerNodesToFlatemits@e<n>refs withtype/rect/enabled/hittable— verified by reading bothrn-fast-runner-client.ts:488andrn-android-runner-client.ts:230.{error: {message, code}}— but MCP consumers see the post-failResultshape{ok:false, error: string, code: string}(perfailResult(message, code)atrn-fast-runner-client.ts:564).Test plan
findRefByTestIDconsumer → resolves expected ref byidentifierfindRefByTestID→ returns null (refuses to scan failed snapshot)snapshotEnvelopeFailed→ returns false (success)snapshotEnvelopeFailed→ returns trueRefs
scripts/cdp-bridge/src/runners/rn-fast-runner-client.ts:488(mapRunnerNodesToFlat)scripts/cdp-bridge/src/runners/rn-android-runner-client.ts:230(AndroidmapRunnerNodesToFlat)scripts/cdp-bridge/src/tools/device-batch.ts:61(findRefByTestID)scripts/cdp-bridge/src/tools/device-batch.ts:111(snapshotEnvelopeFailed)🤖 Generated with Claude Code